Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMSBIOHUB-374: BioHub Feature Submission Schema #209

Merged
merged 20 commits into from
Nov 28, 2023
Merged

Conversation

NickPhura
Copy link
Collaborator

@NickPhura NickPhura commented Nov 15, 2023

Overview

Note: Includes breaking migration changes, so we will need to drop dev before merging this. Also, the api/app is likely going to be broken due to having removed lots of old database tables.

Links to Jira tickets

https://apps.nrs.gov.bc.ca/int/jira/browse/SIMSBIOHUB-374

Description of relevant changes

New migrations to add new feature/search/security/message tables. These tables are all net new. The one feature_submission table does hang off the existing submission table.

New see to populate the new lookup tables (with some realistic, but not final, values).

Removed all old biohub tables that we either know we won't need, or are likely to not need. I just updated the existing migrations, so we will need to do a clean wipe of dev before merging this PR.

Removed all code related to the database views, including the biohub_dapi_v1 user.

  • Why? It seemed like they weren't providing any value. I believe the original theory behind them was that the biohub_api user would have reduced permissions to mess with the database, but in testing some of the database roles, etc, it seemed like the biohub_api user had more-or-less the same permissions as the regular postgres user. So in an effort to simplify things, and maybe improve performance (maybe negligibly), I just removed them, and gave the biohub_api user full permissions. We aren't operating in an environment where users have their own database accounts - everything goes through the API and it may as well have full powers.

I also removed lots of references to DB_SCHEMA as the biohub schema may as well be hardcoded, given the earliest migrations all reference it directly anyways, so its not really configurable.

Added a seed to insert mock test data (disabled by default).

Remove DB_SCHEMA usage in most places.
Remove DB Views
Add Mock test data seed (disabled by default)
Copy link

codecov bot commented Nov 16, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (8b1bcc8) 69.01% compared to head (010f04b) 69.00%.

Files Patch % Lines
api/src/openapi/schemas/biohub-data-submission.ts 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #209      +/-   ##
==========================================
- Coverage   69.01%   69.00%   -0.02%     
==========================================
  Files         213      214       +1     
  Lines        6093     6094       +1     
  Branches      938      938              
==========================================
  Hits         4205     4205              
- Misses       1664     1665       +1     
  Partials      224      224              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@curtisupshall curtisupshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, but when I build I get the following error on the database container:

2023-11-15 23:52:30.133 PST [1] LOG:  starting PostgreSQL 12.5 (Debian 12.5-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2023-11-15 23:52:30.133 PST [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-11-15 23:52:30.133 PST [1] LOG:  listening on IPv6 address "::", port 5432
2023-11-15 23:52:30.142 PST [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-11-15 23:52:30.165 PST [85] LOG:  database system was shut down at 2023-11-15 23:52:30 PST
2023-11-15 23:52:30.171 PST [1] LOG:  database system is ready to accept connections
2023-11-15 23:52:55.132 PST [137] ERROR:  permission denied for table user_identity_source
2023-11-15 23:52:55.132 PST [137] CONTEXT:  SQL statement "select user_identity_source_id from user_identity_source where name = p_user_identity_source_name and record_end_date is null"
        PL/pgSQL function api_set_context(character varying,character varying) line 21 at SQL statement
2023-11-15 23:52:55.132 PST [137] STATEMENT:  select api_set_context($1, $2);

Makefile Show resolved Hide resolved
@NickPhura
Copy link
Collaborator Author

Code looks good, but when I build I get the following error on the database container:

2023-11-15 23:52:30.133 PST [1] LOG:  starting PostgreSQL 12.5 (Debian 12.5-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2023-11-15 23:52:30.133 PST [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-11-15 23:52:30.133 PST [1] LOG:  listening on IPv6 address "::", port 5432
2023-11-15 23:52:30.142 PST [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-11-15 23:52:30.165 PST [85] LOG:  database system was shut down at 2023-11-15 23:52:30 PST
2023-11-15 23:52:30.171 PST [1] LOG:  database system is ready to accept connections
2023-11-15 23:52:55.132 PST [137] ERROR:  permission denied for table user_identity_source
2023-11-15 23:52:55.132 PST [137] CONTEXT:  SQL statement "select user_identity_source_id from user_identity_source where name = p_user_identity_source_name and record_end_date is null"
        PL/pgSQL function api_set_context(character varying,character varying) line 21 at SQL statement
2023-11-15 23:52:55.132 PST [137] STATEMENT:  select api_set_context($1, $2);

Good catch.

I've pushed a change to fix this. The issue was that I assumed "GRANT ALL ON ..." would do the trick, but apparently that only works on currently existing items and doesn't work for items created afterwards. It now changes the default permissions for the biohub_api user, so it should have proper permissions for all future items as well.

@NickPhura NickPhura added the Not Ready For Review Addressing feedback and/or refactoring label Nov 17, 2023
@NickPhura NickPhura removed the Not Ready For Review Addressing feedback and/or refactoring label Nov 17, 2023
@NickPhura NickPhura requested a review from MacQSL November 22, 2023 23:35
@NickPhura NickPhura added the Ready For Review PR is ready for review label Nov 23, 2023
@NickPhura
Copy link
Collaborator Author

NickPhura commented Nov 24, 2023

@AlfredRosenthal Re Architectural Overview

I should probably move this somewhere more formal, and maybe add some pictures, or something, but hopefully this helps a bit for now.

The schema is basically divided into 4 chunks (not including users and messages, and other random tables): Data, Types/Properties, Search, Security.

Data

- submission (An entire submission record. Ie: a Survey).
- submission_feature (A single typed element of a submission. Each row represents 1 item (such as a single observation, or single animal. We also expect there to always be exactly 1 'dataset' element, representing the general metadata of the dataset))

Types/Properties

These lookup tables will be used to provide context to the search_* fields, and also to provide our rules for validating incoming data.
- feature_type (Types of submission_features. Ex: dataset, observation, animal, sample site, etc)
- feature_type_property (Join table. What properties can a given feature type have. Ex: datasets can have 'name' and 'method', while observations can have 'count', 'sex', and 'lifestage')
- feature_property (Known/recognized properties. Ex: name, method, start_date, count, sex, lifestage, description, lat, lng, etc)
- feature_property_type (Types of properties. Ex: string, number, boolean, datetime, spatial, etc)

Search

The search tables contain all of the key/value pairs parsed out of the submission_feature properties, based on the known/recognized properties for that feature type. The search tables are split into multiple tables so that the value column in each one can be strongly typed. Each row in one of these tables will point to one of the feature_property rows, giving context to what it is. Ex. A search_number record might have a value of 50, and a feature_property id that points to the 'count' feature_property. Since this search_number record also points to a submission_feature (presumably of type 'observation') we can now run a sql query to "find all observation records where 'count' is >= 50".
- search_string
- search_number
- search_datetime
- search_spatial

Security

- security_rule (Contains a list of named rules as defined by the admins. Ex: 'Female Moose in Skeena')

Similar to the search tables, the security condition tables are broken out into multiple tables so that each one can be strongly typed, and only worry about executing the rule against a specific type.
- security_string (Is a child of 1 security rule. Defines security conditions on string values from the search_string table.)
- security_number (Is a child of 1 security rule. Defines security conditions on number values from the search_number table.)
- security_datetime (Is a child of 1 security rule. Defines security conditions on datetime values from the search_datetime table.)
- security_spatial (Is a child of 1 security rule. Defines security conditions on spatial values from the search_spatial table.)

Security Functions

- evaluate_security_rule (Given the id of a submission_feature, runs all security rules against the feature, and returns those rules where ALL conditions returned true. This function runs all 4 of the below functions.)

Each security condition function takes the name of a security condition rule, and the id of a submission_feature, and returns true if it applies. The functions themselves do a bit of dynamic sql generation allowing for a bit of query building for the rules (they read the comparator and value columns of the security_* row to generate sql).
- evaluate_security_string_condition
- evaluate_security_number_condition
- evaluate_security_datetime_condition
- evaluate_security_spatial_condition

Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Contributor

@KjartanE KjartanE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran locally, works as intended

Copy link
Collaborator

@MacQSL MacQSL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Built successfully from fresh install

@KjartanE KjartanE merged commit 71c8882 into dev Nov 28, 2023
20 of 22 checks passed
@KjartanE KjartanE deleted the SIMSBIOHUB-374 branch November 28, 2023 21:38
KjartanE pushed a commit that referenced this pull request Nov 29, 2023
* Submission Feature Migrations and Test Seeding
Remove DB_SCHEMA usage in most places.
Remove DB Views
Add Mock test data seed (disabled by default)
KjartanE pushed a commit that referenced this pull request Nov 29, 2023
* Submission Feature Migrations and Test Seeding
Remove DB_SCHEMA usage in most places.
Remove DB Views
Add Mock test data seed (disabled by default)
KjartanE pushed a commit that referenced this pull request Nov 29, 2023
* Submission Feature Migrations and Test Seeding
Remove DB_SCHEMA usage in most places.
Remove DB Views
Add Mock test data seed (disabled by default)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready For Review PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants